Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

caddyhttp: Add MatchWithError to replace SetVar hack #6596

Merged
merged 11 commits into from
Nov 4, 2024

Conversation

francislavoie
Copy link
Member

As recently discussed on Slack, our RequestMatcher interface is sometimes insufficient in situations where matchers cannot safely operate because it can only return a boolean. Since matchers make a decision whether or not handlers can run, returning false isn't a secure decision because it may cause handlers that must run (e.g. authentication) to not get run, causing possible bypass of security-critical functionality in the config.

We already had a mechanism for this, which I added in #4346 via putting a variable in Context and reading it back after the matcher ran. That was absolutely a hack and not the best solution. I couldn't think of a better way at the time (lack of Go skills 😅)

To solve this, the better solution is to introduce a new RequestMatcherWithError interface which returns (bool, error), so that an error can be returned to cancel the HTTP request pipeline if the matcher's preconditions aren't met. For now, we can make this optional (so if a matcher implements this interface we call it if possible) until we know all matcher plugins have caught up implementing with this (though unfortunately guaranteeing that is quite difficult and a multi-step process to clean up).

@francislavoie francislavoie added the feature ⚙️ New feature or request label Sep 29, 2024
Copy link
Member

@mholt mholt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This mostly LGTM. But also, don't we want to remove the error returned in the context since it's no longer needed?

modules/caddyhttp/ip_matchers.go Outdated Show resolved Hide resolved
@francislavoie
Copy link
Member Author

I kept the error in context for full backwards compat for places that still call Match. I think we need to do this in a few steps over the course of a few releases

@mholt
Copy link
Member

mholt commented Sep 30, 2024

I kept the error in context for full backwards compat for places that still call Match.

Does anyone really know about or use that though?

Anyway, when you're ready to have this merged, go ahead and mark as ready. I'm OK with it if there's another commit to actually use the error return value gets into this PR, or we can do that later.

@francislavoie
Copy link
Member Author

Does anyone really know about or use that though?

I'll do some searches with SourceGraph just in case.

@francislavoie
Copy link
Member Author

Plugged the IP matcher hole, updated the tests to use MatchWithError instead, everywhere.

if r.TLS != nil && !r.TLS.HandshakeComplete {
return false // if handshake is not finished, we infer 0-RTT that has not verified remote IP; could be spoofed
return false, fmt.Errorf("TLS handshake not complete, remote IP cannot be verified")
Copy link
Member Author

@francislavoie francislavoie Oct 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mholt should this be a caddyhttp.Error? If so, what HTTP status code should we use? I think as-is it would be a 500 without being otherwise specified but I'm not sure if that's correct. I don't know how it should behave on an incomplete handshake. Is there some 4xx series code which can instruct the client to try again or w/e?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there some 4xx series code which can instruct the client to try again or w/e?

@francislavoie, I was able to work around the issue mentioned here #6427 (comment) with this in front of my ip matchers.

@early tls early_data
error @early 425

Before I have added this, I was able to reproduce my issue where the IP matcher "failed" (returned false for my IP address) after approx. 10 minutes of inactivtiy and then pressing F5 to refresh the page.
After have this added, I was no longer able to reproduce the issue.

Based on that experience, I would think the HTTP status code 425 is the desirable one here.

Copy link
Member Author

@francislavoie francislavoie Oct 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah thanks, good to know! I'm away from home today, would you be willing to checkout this branch, change this line to a caddyhttp.Error, build it and test it? Would be nice to confirm. 425 makes sense according to https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/425 👍

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will try it - just a bit difficult in that particular environment, because my 2.9.0 tests require the "promote-metrics" branch. But I have other hosts for testing, where I can disable metrics.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's looking good 👍
My change was this: matcher-with-error...steffenbusch:caddy:matcher-with-error

Before:

With 2.9.0-beta from "promote-metrics" branch, which means without your change, I got the undesired 404 because the IP-matcher was not matching with my IP address; followed by a success on a F5:

tail -f access.json | jq '{user_id, status, ts, tls: .request.tls, remote_ip: .request.remote_ip, uri: .request.uri}'

{
  "user_id": "",
  "status": 404,
  "ts": 1728816499.5479946,
  "tls": {
    "resumed": true,
    "version": "TLSv1.3",
    "cipher_suite": "TLS_AES_128_GCM_SHA256",
    "proto": "h3",
    "server_name": "rambo.stbu.net"
  },
  "remote_ip": "95.89.172.31",
  "uri": "/test/ip"
}
{
  "user_id": "",
  "status": 200,
  "ts": 1728816503.3737376,
  "tls": {
    "resumed": true,
    "version": "TLSv1.3",
    "cipher_suite": "TLS_AES_128_GCM_SHA256",
    "proto": "h3",
    "server_name": "rambo.stbu.net"
  },
  "remote_ip": "95.89.172.31",
  "uri": "/test/ip"
}

After:

Building it based on your branch with the httperror 425 tweak, it was working in my browser, and this was logged in access.json:

{
  "user_id": "",
  "status": 425,
  "ts": 1728817672.261492,
  "tls": {
    "resumed": true,
    "version": "TLSv1.3",
    "cipher_suite": "TLS_AES_128_GCM_SHA256",
    "proto": "h3",
    "server_name": "rambo.stbu.net"
  },
  "remote_ip": "95.89.172.31",
  "uri": "/test/ip-chrome"
}
{
  "user_id": "",
  "status": 200,
  "ts": 1728817672.265926,
  "tls": {
    "resumed": true,
    "version": "TLSv1.3",
    "cipher_suite": "TLS_AES_128_GCM_SHA256",
    "proto": "h3",
    "server_name": "rambo.stbu.net"
  },
  "remote_ip": "95.89.172.31",
  "uri": "/test/ip-chrome"
}

The 425 is logged, but I have not observed any issue in my Chrome Browser (maybe a little bit longer the "pending" than usually) and saw the result of the request immediately, which was logged at 200 instantly after the 425.

However, I noticed a build-error for this branch related to github.com/mholt/caddy-ratelimit:

2024/10/13 11:27:46 [INFO] exec (timeout=0s): /usr/local/go/bin/go build -o /usr/local/bin/caddy -ldflags -w -s -trimpath -tags nobadger
# github.com/mholt/caddy-ratelimit
/root/go/pkg/mod/github.com/mholt/caddy-ratelimit@v0.0.0-20240828171918-12435ecef5db/handler.go:176:7: multiple-value rl.matcherSets.AnyMatch(r) (value of type (bool, error)) in single-value context
2024/10/13 11:29:15 [INFO] Cleaning up temporary folder: /tmp/buildenv_2024-10-13-1126.189145495
2024/10/13 11:29:15 [FATAL] exit status 1
Error: building at STEP "RUN go install github.com/caddyserver/xcaddy/cmd/xcaddy@latest &&     /usr/local/bin/xcaddy build --output /usr/local/bin/caddy       --with github.com/caddyserver/caddy/v2=github.com/caddyserver/caddy/v2@matcher-with-error       --with github.com/caddyserver/transform-encoder       --with github.com/ueffel/caddy-tls-format       --with github.com/muety/caddy-remote-host       --with github.com/ggicci/caddy-jwt       --with github.com/caddy-dns/ionos       --with github.com/ueffel/caddy-brotli       --with github.com/aksdb/caddy-cgi/v2       --with github.com/mholt/caddy-webdav       --with github.com/mholt/caddy-ratelimit &&     /usr/local/bin/caddy version": while running runtime: exit status 1

For the above tests, I have removed the ratelimit plugin to get it compiled.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent, thanks! Will apply the fix later today. Good callout on AnyMatch with ratelimit, I assumed no plugins were using that so I made a more aggressive API change. I'll fix that up.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay the AnyMatch issue should be fixed now, and changed to use 425 status, if you want to confirm I did it right :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@francislavoie

should this be a caddyhttp.Error? If so, what HTTP status code should we use? I think as-is it would be a 500 without being otherwise specified but I'm not sure if that's correct. I don't know how it should behave on an incomplete handshake. Is there some 4xx series code which can instruct the client to try again or w/e?

I'm not really sure :( The problem is that neither the client nor the server have done anything particularly wrong... there is no actual error. We simply do not have a value that can be trusted.

You could argue it's a 4xx error since the client should not have done a 0-RTT, but does it know that? Probably not.

You could argue it's a 5xx error since the server can't correctly do what it was asked, but is that its fault? Not really.

I don't believe there's any HTTP status that can cause the TLS connection to be redone. 500 is the best all around option when you don't know what other status code to use. But I still don't like the idea of an HTTP error when no HTTP error has occurred 😕

It's possible (?) that a 426 Upgrade Required could cause the client to break the TLS connection and re-establish it if sent with a Connection: close header instead of Connection: Upgrade, but I have no idea honestly. Real-world testing would be required.

This is kinda why I didn't implement errors with matchers at first 🫠

Copy link
Member

@mohammed90 mohammed90 Oct 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could argue it's a 4xx error since the client should not have done a 0-RTT, but does it know that? Probably not.

Per the RFC, 425 is the correct status, as mentioned earlier:

The server can cause a client to retry individual requests and
not use early data by responding with the 425 (Too Early) status
code (Section 5.2) in cases where the risk of replay is judged
too great.

There are some caveates such as

Note that a server cannot choose to selectively reject early data at
the TLS layer. TLS only permits a server to either accept all early
data or none of it. Once a server has decided to accept early data,
it MUST process all requests in early data, even if the server
rejects the request by sending a 425 (Too Early) response.

A server cannot make a request that contains the Early-Data header
field safe for processing by waiting for the handshake to complete.
A request that is marked with Early-Data was sent in early data on a
previous hop. Requests that contain the Early-Data header field and
cannot be safely processed MUST be rejected using the 425 (Too Early)
status code.

The RFC also says the clients must handle 425 properly by establishing a new connection to the server and retry the request.

@francislavoie
Copy link
Member Author

@mholt I've changed enough in the past few commits that I think it warrants another review.

Copy link
Member

@mholt mholt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the thoughtful work on this. I don't love that we have (?) to do this, but I guess that's how it is.

If we are going to make this an actual breaking change in the future, I'd rather just put an error return on the signature of Match.

// An error during matching will abort the request middleware chain and
// invoke the error middleware chain.
//
// This will eventually replace RequestMatcher. Matcher modules
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm kinda bummed because if we did this I would want it to be Match(*http.Request) (bool, error).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. I don't see how. It's like DialContext etc. Something to redo in v3 lmao

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe, but if we remove this interface, it'll be a breaking change anyway, so might as well break it into something that we want right? 😅

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can:

  • Add RequestMatcherWithError (this)
  • Give plugins time to implement it, all plugins should stop using the interface guard for RequestMatcher
  • Later, change the signature of RequestMatcher to Match(*http.Request) (bool, error) (should be safe since plugins no longer use it)
  • Deprecate RequestMatcherWithError, preferring RequestMatcher if implemented
  • Remove RequestMatcherWithError

All this might take months or years but that would be the path.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's what I'm thinking too. It'll kind of suck, but... might be worth it.

@francislavoie francislavoie enabled auto-merge (squash) November 4, 2024 23:13
@francislavoie francislavoie merged commit 09b2cbc into master Nov 4, 2024
32 of 33 checks passed
@francislavoie francislavoie deleted the matcher-with-error branch November 4, 2024 23:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature ⚙️ New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants